An Adaptive Partitioning Scheme for Ad-hoc and Time-varying Database Analytics
نویسندگان
چکیده
Data partitioning significantly improves query performance in distributed database systems. A large number of techniques have been proposed to efficiently partition a dataset, often focusing on finding the best partitioning for a particular query workload. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload. Furthermore, workloads change over time as businesses evolve or as analysts gain better understanding of their data. Static workload-based data partitioning techniques are therefore not suitable for such settings. In this thesis, we present Amoeba, an adaptive distributed storage system for data skipping. It does not require an upfront query workload and adapts the data partitioning according to the queries posed by users over time. We present the data structures, partitioning algorithms, and an efficient implementation on top of Apache Spark and HDFS. Our experimental results show that the Amoeba storage system provides improved query performance for ad-hoc workloads, adapts to changes in the query workloads, and converges to a steady state in case of recurring workloads. On a real world workload, Amoeba reduces the total workload runtime by 1.8x compared to Spark with data partitioned and 3.4x compared to unmodified Spark. Thesis Supervisor: Samuel Madden Title: Professor of Electrical Engineering and Computer Science
منابع مشابه
ADAPTIVE ORDERED WEIGHTED AVERAGING FOR ANOMALY DETECTION IN CLUSTER-BASED MOBILE AD HOC NETWORKS
In this paper, an anomaly detection method in cluster-based mobile ad hoc networks with ad hoc on demand distance vector (AODV) routing protocol is proposed. In the method, the required features for describing the normal behavior of AODV are defined via step by step analysis of AODV and independent of any attack. In order to learn the normal behavior of AODV, a fuzzy averaging method is used fo...
متن کاملADAPTIVE FUZZY TRACKING CONTROL FOR A CLASS OF NONLINEAR SYSTEMS WITH UNKNOWN DISTRIBUTED TIME-VARYING DELAYS AND UNKNOWN CONTROL DIRECTIONS
In this paper, an adaptive fuzzy control scheme is proposed for a class of perturbed strict-feedback nonlinear systems with unknown discrete and distributed time-varying delays, and the proposed design method does not require a priori knowledge of the signs of the control gains.Based on the backstepping technique, the adaptive fuzzy controller is constructed. The main contributions of the paper...
متن کاملAmoeba: A Shape changing Storage System for Big Data
Data partitioning significantly improves the query performance in distributed database systems. A large number of techniques have been proposed to efficiently partition a dataset for a given query workload. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload upfront. Furthermore, workloads change over time as ...
متن کاملRobust Data Partitioning for Ad-hoc Query Processing
Data partitioning can significantly improve query performance in distributed database systems. Most proposed data partitioning techniques choose the partitioning based on a particular expected query workload or use a simple upfront scheme, such as uniform range partitioning or hash partitioning on a key. However, these techniques do not adequately address the case where the query workload is ad...
متن کاملADAPTIVE FUZZY OUTPUT FEEDBACK TRACKING CONTROL FOR A CLASS OF NONLINEAR TIME-VARYING DELAY SYSTEMS WITH UNKNOWN BACKLASH-LIKE HYSTERESIS
This paper considers the problem of adaptive output feedback tracking control for a class of nonstrict-feedback nonlinear systems with unknown time-varying delays and unknown backlash-like hysteresis. Fuzzy logic systems are used to estimate the unknown nonlinear functions. Based on the Lyapunov–Krasovskii method, the control scheme is constructed by using the backstepping and adaptive techniqu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016